On the equivalence of two expected average reward criteria for zero-sum semi-Markov games

نویسنده

  • Anna Jaśkiewicz
چکیده

In this paper we study two basic optimality criteria used in the theory of zero-sum semi-Markov games. According to the first one, the average reward for player 1 is the lim sup of the expected total rewards over a finite number of jumps divided by the expected cumulative time of these jumps. According to the second definition, the average reward (for player 1) is the lim sup of the expected total rewards over the finite deterministic horizon divided by the length of the horizon. We shall call them the ratio-average reward and time-average reward, respectively. It is known that in general these two criteria can have nothing to do with each other. In other words, they may lead to different rewards and optimal strategies for players. The ratio-average reward is somewhat easier to study and has been used by many authors in zero-sum games and in dynamic programming. Recently, some results concerning the optimality equation for semi-Markov games with Borel state space and the ratio-average criterion were given [1]. However, an equivalence result has not been reported so far for the Borel (uncountable) state space models. The aim of this paper is to show the equivalence between two expected average rewards under some geometric ergodic conditions. At the same time, we prove that the optimality equations for the models with these criteria are the same. Our proof is based on [2] and employs basic facts from renewal theory. Certain consequences of V -geometric ergodicity given in enable us to apply the optional sampling theorem of Doob, which is the core of the proof.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonzero - Sum Stochastic Games

This paper extends the basic work that has been done on tero-sum stochastic games to those that are nonzerosum. Appropriately defined equilibrium points are shown to exist for both the case where the players seek to maximize the total value of their discounted period rewards and the case where they wish to maximize their average reward per period. For the latter case, conditions required on the...

متن کامل

Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games

In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermediate reward to be the difference between our score and our opponent’s score; the “true” reward of a win, loss, or tie is determined at the end of a game by applying a threshold function to the cumulative intermediate re...

متن کامل

Structural approximations in discounted semi-Markov games

We consider the problem of approximating the values and the equilibria in two-person zero-sum discounted semi-Markov games with in nite horizon and compact action spaces, when several uncertainties are present about the parameters of the model. Speci cally: on the one hand, we study approximations made on the transition probabilities, the discount factor and the reward functions when the state ...

متن کامل

Semi-markov Decision Processes

Considered are infinite horizon semi-Markov decision processes (SMDPs) with finite state and action spaces. Total expected discounted reward and long-run average expected reward optimality criteria are reviewed. Solution methodology for each criterion is given, constraints and variance sensitivity are also discussed.

متن کامل

A TRANSITION FROM TWO-PERSON ZERO-SUM GAMES TO COOPERATIVE GAMES WITH FUZZY PAYOFFS

In this paper, we deal with games with fuzzy payoffs. We proved that players who are playing a zero-sum game with fuzzy payoffs against Nature are able to increase their joint payoff, and hence their individual payoffs by cooperating. It is shown that, a cooperative game with the fuzzy characteristic function can be constructed via the optimal game values of the zero-sum games with fuzzy payoff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003